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Abstract-  We  identify  various  situations  in  probabilistic  intelligent  systems 
in  which  conditionals  (rules)  as  mathematical  entities  as  well  as  their  condi¬ 
tional  logic  operations  are  needed.  In  discussing  Bayesian  updating  proce¬ 
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1  Introduction 

In  probabilistic  systems,  the  production  rules  (if  ...  then  ...  rules)  con¬ 
nect  events  and  are  quantified  by  conditional  probabilities.  With  additional 
structures,  such  as  conditional  independence,  the  problem  seems  feasible  and 
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computations  are  based  entirely  on  the  standard  calculus  of  probabilities  (see 
e.g.  Pearl,  1988). 

The  situation  is  far  from  clear  when  events  of  interest  are  conditional 
events.  In  this  paper,  we  will  point  out  situations  in  which  these  problems 
occur.  When  we  try  to  extend  probabilistic  techniques  to  these  situations,  we 
realize  that  new  objects  and  tools  are  needed.  It  ail  boils  down  to  modeling 
if  ...  then  ...  rules  in  some  appropriate  fashion,  and  yet  compatible  with 
conditional  probability  evaluations. 


2  Why  do  we  need  a  mathematical  concept 
of  conditional  events? 


To  be  specific,  propositions  or  events  are  viewed  as  elements  of  a  <r-algebra 
A  of  subsets  of  a  universe  of  discourse  Q.  The  pair  (0,-4)  thus  denotes  a 
measurable  space.  We  use  letters  a,  b,  c  ...  to  denote  elements  of  A.  Set 
operations  are:  A  (or  simply  for  intersection),  V  (union),  (-)'  (complement), 
<  (set-inclusion),  0  (empty  set). 

There  is  more  than  one  way  to  quantify  a  rule  of  the  form  “if  b  then  a” 
by  probabilities.  In  the  context  of  two-valued  logic,  this  rule,  symbolized  as 
6  — *  a,  is  interpreted  as,  material  implication,  that  is  b  — ♦  a  —  V  V  a,  which 
is  an  element  of  A.  If  P  is  a  probability  measure  on  A ,  then  the  strength  of 
the  rule  b  —*  a  can  be  taken  as  P(P  V  a).  See  e.g.  Nilsson,  1986. 

However,  due  to  the  meaning,  as  well  as  to  the  uncertainty  involved,  the 
quantification  of  6  — ►  a  is  via  conditional  probability,  that  is  P(b  -  a)  = 
P(a  |  6),  provided  P(b)  >  0.  If  we  take  this  viewpoint,  then  b  —►  a  cannot  be 
modeled  by  material  implication,  since  P(a  |  b)  /  P(t'Va),  in  general.  More 
importantly,  if  6  — +  a  is  quantified  by  P(a  |  6),  then  6  — ♦  a  cannot  be  an 
element  of  A.  This  is  known  as  Lewis’  triviality  result  (Lewis,  1976).  In  prob¬ 
abilistic  systems  (see  e.g.  Pearl,  1988),  the  modeling  of  causal  relationships 


among  variables  of  interest  (in  some  knowledge  domain)  seems  unnecessary. 
That  is,  one  does  not  need  to  define  6  —*  a  as  some  mathematical  entity.  In 
contrast,  relations  among  variables,  such  as  conditional  independence,  and 
the  assignment  of  conditional  probabilities  to  rules  (expressed  in  a  natural 
language),  as  well  as  prior  probabilities,  suffice  to  specify  a  joint  probabil¬ 
ity  distribution  on  all  variables  involved,  so  that  probabilistic  inference  can 
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be  carried  out.  This  is  somewhat  similar  to  situations  in  probability  and 
statistics:  the  quantity  P(a  |  b )  stands  for  P(,(a)  =  P(ab)/P(b)  where  Pk 
is  a  probability  measure  on  A,  defined  as  Pb(a)  —  P(ab)/ P(b).  Although, 
DeFinetti  (1974)  did  consider  (a  |  b)  as  a  mathematical  entity,  namely  an 
object  with  three  “truth” -values:  true  (when  both  a  and  b  are  true),  false 
(when  a  is  false  and  6  is  true)  and  undetermined  (when  b  is  false),  this  obser¬ 
vation  does  not  contribute  anything  new  to  probability  and  statistics.  It  is 
interesting  to  point  out  that,  in  the  same  vein,  as  far  as  we  know,  the  concept 
of  “conditional  random  variables”  was  mentioned  only  in  WiLks  (1963),  in 
an  intuitive  setting. 

The  common  point  is  this.  While  one  is  free  to  ask  questions  and  pur¬ 
sue  mathematical  investigations,  the  results  obtained  will  be  marginal  and 
hence  ignored  if  they  do  not  lead  to  advances  in  applications.  See  Goodman, 
Nguyen  and  Walker  (1992)  for  a  history  of  the  mathematical  investigations 
of  conditional  events. 

As  we  will  see,  it  turns  out  that  the  need  to  model  conditional  events 
or  production  rules  as  mathematical  entities  (as  opposed  to  primitives  in 
natural  languages,  as  in  Adams,  1975,  or  in  the  general  discussions  in  the 
philosophical  community)  is  apparent  in  the  field  of  expert  systems  where, 
adopting  Bayesian  methodology,  one  insists  pausing  probabilistic  techniques 
for  the  management  of  uncertainty.  This  is  essentially  due  to  the  fact  that, 
intelligent  systems  are  concerned  with  reasoning  with  knowledge.  Now,  not 
only  knowledge  can  be  represented  in  different  forms,  but  it  is,  in  general, 
expressed  in  some  conditional  form. 

In  the  following,  we  will  illustrate  the  above  need.  Recall  that  we  write 
b  — *■  a  for  “if  b  then  a”,  and  use  P(a  |  6)  to  specify  the  strength  of  this  rule. 

(i)  This  example  is  inspired  from  Adams  (1992).  Consider  a  box  con¬ 
taining  red,  blue  and  white  balls  with  unknown  proportions.  We  are  inter¬ 
ested  in  the  probability  of  getting  a  blue  ball  on  the  first  drawn  of  a  ball 
from  this  box.  Suppose  that  we  learn  the  information  “there  are  many  more 
blue  balls  than  white  balls”  (or  even  with  more  precise  numerical  informa¬ 
tion,  such  as  P(blue|not  red)«  .99).  Let  us  examine  the  heuristic  expression 
P(blue|(blue|not  red)). 

As  emphasized  in  Adams  (1992),  the  above  expression  cannot  be  written 
in  standard  probability  theory,  since  the  antecedent  (bluejnot  red)  (or  not 
red— +blue)  is  not  yet  defined  mathematically,  and  more  over,  as  mentioned 
earlier,  even  if  it  can  be  defined,  it  does  not  belong  to  the  domain  of  P.  Thus, 
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first  we  need  to  model  (not  red— +blue),  and  then  (not  red— *  blue)— >  blue  as 
an  iterated  conditional.  Once  this  task  is  completed,  we  still  have  to  specify 
an  associated  probability  measure  on  the  new  space  of  conditionals  to  give  a 
rigorous  formulation  of  the  above  heuristic  expression. 

Let  us  pursue  this  example  a  little  further.  In  expert  systems,  we  usually 
have  several  rules,  say  “6;  — »  a”,  i  =  1,2, . . . ,  n.  To  evaluate  the  probability 
of  some  event  of  interest  c  from  this  rule-base,  we  formally  write 

P(a  |  (6j  — »  and  (b 2  —*  a2)  and  ...  (6n  — ►  a„)). 

The  combination  of  rules,  say,  via  the  logical  connective  “and”  can  be  carried 
out  if  “and”  is  specified.  This  is  basically  the  problem  of  “reasoning  with 
conditional  knowledge”,  in  which  we  need  to  specify  a  logic,  that  is,  am 
algebraic  structure  of  conditionals. 

(ii)  A  basic  inference  principle  in  rule-based  systems  is  Modus  ponens. 

In  two- valued  logic  framework,  where  6  — ►  a  =  V  V  a,  we  deduce  a  if  the 

evidence  b  holds.  This  is  because  here  the  partial  order  <  (set  inclusion)  is 
precisely  the  entailment  relation.  Specifically: 

(6  -  *■  <i)b  =  (&'  V  a)b  =  ab  <  a. 

When  the  evidence  c  ^  b,  one  can  obtain  a  degree  of  uncertainty  on  a  by 
computing  P[(6'  V  a)c]. 

Similarly,  in  fuzzy  logic  (see  e.g.  Yager  et  al,  1987),  extending  classical 
logic,  and  where  a,  b,  c  become  fuzzy  sets,  the  conclusion  of  modus  ponens 
takes  the  form  (6  —*  a)c,  describing  a  new  fuzzy  set  in  which  “conjunction”  is 
chosen  as  some  t-norm  (e.g.  minimum  operator),  and  the  fuzzy  implication 
6  — »  a  is  interpreted  using  some  truth  table  for  the  fuzzy  implication  (binary) 
operator  — ►  . 

However,  if  we  insist  on  the  quantification  P(b  — ►  a)  =  P(a  |  b),  we  have 
to  proceed  differently,  again,  since  b  —*  a  will  be  no  longer  b/  V  a. 

Writing  6  — »  a  as  a  conditional  object  (a  |  b),  and  identifying  c  with 
(c  |  fi),  we  can  form  (e  |  6)(c  |  fl)  where  conjunction  of  conditionals  need  to 
be  specified.  From  that,  a  computation  of  probability  is  .possible. 

(iii)  As  mentioned  in  Goldszmidt  and  Pearl  (1992),  the  ruled-base  of  an 
expert  system  might  contain  a  rule  of  the  form  “If  ( b  —*  a)  then  ( d  — +  c)”, 
symbolized  as  (a  |  6)  =>  (c  j  d). 
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It  is  obvious  that  to  quantify  this  rule  by  conditional  probability,  i.e. 
computing  P[(a  |  fc)  =>•  (c  |  d)],  we  first  have  to  define  the  objects  like  (a  j  b). 
Next,  =>  is  some  implication  operator  among  these  conditional  objects,  which 
can  be  derived  from  logical  operations  among  these  objects.  Finally,  what 
probability  measure  on  the  space  of  conditionals  to  use  in  order  to  quantify 
the  rule? 

3  Bayesian  updating  and  belief  construction 

Before  going  into  probabilistic  inferences  such  as  Bayesian  updating  proce¬ 
dures  and  combination  of  evidence  using  belief  functions  (Shafer,  1976),  let 
us  outline  briefly  previous  efforts  on  formulating  a  mathematical  theory  of 
conditionals  (see  e.g.  Goodman,  Nguyen  and  Walker,  1991,  for  details). 

Consider  again  a  measurable  space  (ft,  .4).  In  view  of  Lewis’  triviality 
result,  there  is  no  binary  operation  — »  from  A  x  A  to  A  (where  x  denotes 
cartesian  product)  such  that  for  any  a,  b  €  A,  and  any  probability  measure 
P  on  A  with  P(b )  >  0,  one  has 

P(6  ->  o)  =  P(a  |  b). 

Thus,  in  modeling  the  rule  6  — »  a,  whose  quantification  is  P(a  |  6),  one  has 
to  go  “outside”  of  A. 

One  axiomatic  derivation  leads  to  a  representation  of  b  —  >  a  as  an  “inter¬ 
val”  in  A  (see  also,  Nguyen,  1992),  namely 

b  a  =  {x€i.4:a&<;r<PVa} 

=  [af>,  U  V  a]  for  short. 

When  a  —  b,  by  identifying  [a,  a]  with  a,  we  see  that  in  general,  b  —*  a  lies 
outside  of  A. 

It  is  easy  to  check  that  {a,  6]  =  6'  V  a  — ♦  a  (since  a  <  6),  hence  the  space 
of  all  closed  intervals,  denoted  as  A  |  .4,  is  precisely  that  of  all  conditionals 
h  — ►  a. 

This  space  contains  A  strictly.  Contrary  to  a  statement  in  Gilio  and 
Spezzaferri  (1992),  these  conditionals  arc  equivalent  to  DeFinetti’s  condi 
tional  events.  To  see  this,  viewing  fl  as  the  set  of  all  models  in  a  logical 


setting,  DeFinetti’s  conditional  event  (a  j  6)  is  identified  with  the  generalized 
indicator  function  (see  Schay,  1968) 

{1  for  u>  €  ab 
0  for  at  €  a'b 
u  for  at  €  b'. 

It  is  obvious  that  such  functions  are  in  one-to-one  correspondence  with  ele¬ 
ments  of  A  |  A,  since  they  specify  \J  and  ab ,  and  conversely. 

In  fact,  it  is  precisely  this  three-valued  logic  connection  that  one  can 
discover  all  possible  algebraic  structures  oi  A\  A.  For  example,  Lukasiewicz’ 
three-valued  logic  (see  e.g.  Rescher,  1969)  will  equip  A  j  A  with  interval 
operations.  That  is, 

{a,  6]  A  (c,  d]  =  (ac,  bd\ 

[a,  6]  V  [c,  d\  —  [a  V  c,  fc  V  of] . 

Note  that  A  j  A  is  not  a  Boolean  algebra  since  it  is  not  complemented. 
Indeed,  if  a  <  b  then  b‘  <  a'. 

However,  this  bounded,  distributive  lattice  has  a  pseudo- complementation: 

Mj-  =  [6',6'], 


satisfying  Stone’s  identity 


[a,6]*V[a,6p  =  {1,1] 

so  that  A  |  A  is  a  Stone  algebra  (see  e.g.  Gratzer,  1978). 

The  above  investigations  provide  a  new  mathematical  framework  for  ma¬ 
nipulating  conditional  information.  x 

While  the  mathematical  concept  of  a  conditional  event,  or  of  a  production 
rule,  is  well  understood,  one  would  like  also  to  consider  some  other  equivalent 
representation  of  b  — - ►  a  which  possesses  some  “boolean”  flavor.  This  would 
be  useful  as  in  the  following  situations. 

(i)  Suppose  that  P  is  a  prior  probability  measure  on  (fl,.4).  When  we 
learn  that  some  event  a  €  A  has  occurred,  we  update  our  knowledge  P  by 
conditioning  on  a,  that  is,  change  P  to  Pa.  How  can  we  continue  to  do  so  if, 
instead  of  learning  a,  we  learn  a  rule  b  — >  a?  Viewing  b  — ♦  a  as  a  conditional 
event  [ab,  6'  V  a),  how  do  we  make  sense  of  /J|a5,fc<Va)  as  a  new  probability 
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measure?  The  difficulty  seems  to  lie  in  the  fact  that  [a6, 6'  V  a]  €  A  |  A 
which  is  not  a  Boolean  <r-algebra. 

(ii)  In  the  simplest  situation  of  using  belief  functions  to  quantify  our 
degrees  of  belief  (see  Shafer,  1976),  one  can  construct  a  belief  function  F 
on  fi  (finite)  from  the  knowledge  of  P(a),  for  some  subset  a  of  Q.  as  follows. 
Define  the  assignment  mass  function  m  :  'P(fl)  — »  {0, 1],  where  V(Q)  denotes 
the  power  set  of  fi,  by  x  €  "P(n), 

/  P(a)  if  x  =  a 

m(x)  =<1  — P(a)  if  i  =  Q, 

\  0  otherwise. 

And  then,  as  usual,  for  y  €  P(fl), 

F(y)  =  £  m(2)- 
*<v 

As  in  the  Bayesian  updating  case,  suppose  we  know  a,  b  and  P(a  |  6),  how 
should  we  proceed  to  construct  an  associated  assignment  mass  function? 
The  difficulty  is  similar  to  that  in  the  Bayesian  updating  case. 

In  view  of  situations  as  above,  we  are  going  to  investigate,  in  the  next 
section,  a  “booleanization”  of  conditionals  which  should  provide  a  new  tool 
for  probabilistic  inference  with  conditional  information. 

.  4  A  booleanization  of  conditionals 

Recall  that  a  rule  of  the  form  b  — >  a  cannot  be  modeled  as  an  element  of  the 
<r-algebra  A,  as  long  as  we  want  to  quantify  it  by  P(b  —*  a)  =  P(a  |  6),  for 
any  probability  measure  P  on  the  measurable  space  (0,  A). 

In  Section  3,  we  mentioned  the  conditional  space  A  |  A ,  strictly  larger 
than  A ,  which  admits  b  — >  a  in  its  elements.  However,  A  j  A  is  not  a  a- 
algebra.  We  are  going  to  search  for  a  cr-algebra  larger  than  A  |  A  for  which 
rules  6  — ►  a  are  its  elements. 

We  start  from  the  following  remark  of  D.  Bamber,  NRaD  (personal  com¬ 
munication), 

P(ab)  +°° 

P(a  |  4)  =  P(ab) / P(b)  =  ? -Li-  =  £  P(ab)\P(b'))" 
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(using  ^  =  £  rn,  for  0  <  r  <  1).  The  term  [P(6')]n  suggests  a  product 

'  n=0 

measure  of  the  set  b'  x  b1  x  •  •  ■  x  U  (n  times),  where  x  denotes  cartesian 
product. 

Since  n  runs  over  the  set  of  non-negative  integers,  an  infinite  (countable) 
product  space  is  required. 

Thus,  let  ft  be  the  infinite  cartesian  product  of  ft,  i.e. 

u>  G  ft,  u>  =  (u>i,u>2,cl>3,...),  €  ft,  n  >  1. 

A  cylinder  in  ft  is  a  subset  of  ft  of  the  form  ai  x  a2  x  •  •  •  x  an  x  ft  x  ft  x  . . ., 
for  n  >  1,  and  a*  €  -4.,  i  =  1, . .  -  ,n. 

To  simplify  notation,  we  write  ax  x  a2  x  •  •  •  x  an  to  mean  the  cylinder 
with  this  base.  Thus,  for  example,  ab  is  viewed  as  the  cylinder  ab  x  ft  x  ft  •  •  -, 
and  6'  x  U  is  the  cylinder  6'xPxftxftx*--,  and  so  on. 

Let  A  be  the  infinite  product  cr-algebra  on  ft,  that  is,  the  smallest  a- 
algebra  containing  all  cylinders  of  ft. 

Let  P  denote  the  product  measure  on  (ft,  .4)  with  identical  one-dimensional 
marginals  P,  that  is 

P{ax  xa2x-*-xanxftxftx---)  =  P(ax)P(a2) . . .  P{an),  Vn  >  1. 


Now,  observe  that  the  cylinders  ab,  V  x  ab,  b'  x  6'  x  ab, . . .  are  pairwise  disjoint 
in  ft.  Indeed 


ab  —  ab  x  ft  x  ft  x  -  -  •  =  (u)  =  (u>x,u>2,u>3, . , .)  :  €  ab,uin  €  ft,  n  >  2} 

b'  x  ab=  {u>  =  (tui,a72,W3,...)  :u7i  G  b’,u>2  €  ab,u)n  G  ft,n  >  3}, 

(note  that  ab  and  ab  x  b'  are  not  disjoint). 

Consider  the  map 

/  :  A  x  A  — >  A 

defined  by 

/(a,  b)  —  abV  ( b '  x  ab)  V  ( b‘ '  x  6'  x  a6)  V  . . . 

where,  by  abuse  of  notation,  V  stands  for  set  union  in  ft.  Note  that  /(a,  b) 
is  a  countable  union  of  cylinders,  and  hence  is  an  element  of  A. 


8 


Since  P  is  a  probability  measure,  we  have 

P(f(a,b))  =  P[abV(V  xa6)V(P  x  V  x  ab)  V  . . .] 

=  P{ab)  +  P(P  x  ab)  +  P{V  x  b'  x  ab)  +  ... 

=  P(a6)  +  PW)P(afc)  +  P(W)P(U)P(ab)  4-  -  - . 

=  +f  pwipmr  =  p(“  i »). 

n=0 

Thus,  the  probability  space  (ft,  A,  P)  extends  (ft,  A ,  P)  in  the  sense  that,  for 
a  €  A,  P(/(a,ft))  =  P(a),  and  for  a,  b  €  A  with  P(b)  >  0,  P(/(a,  b))  — 
P(a  |  b). 

In  view  of  this  matching,  the  rule  b  — »  a  can  be  modeled  as  /(a,  b)  which 
is  an  event,  but  in  another  measurable  space. 

Now,  given  6  — ►  a,  we  can  update  P  rigorously  by  P^a,  as  in  the  uncon¬ 
ditional  information  case.  Indeed,  we  take  Pi_a  to  be  P/(a.6)  :  A  (0»  1]> 
which  is  a  usual  conditional  probability  measure:  For 

AeA,  PIM{A)  =  P[A  A  f(a,  6)]/P(/(a,  b)) 

where  A  stands  for  set  intersection  in  ft.  For  c  (E  A,  we  take  Pfc_„(c)  = 
Pf{a,b)(c  x  ft  x  ft  x  •  •  •). 

As  a  final  remark,  while  the  booleanization  of  conditionals  provide  a  rigor¬ 
ous  framework  for  probabilistic  inference  when  dealing  with  conditional  infor¬ 
mation,  the  computations  might  be  complicated.  It  is  anticipated  that  logical 
operations  among  conditionals  (viewed  as  intervals  ir  Boolean  <r-algebras) 
can  be  used  as  approximations  to  Boolean  operations  on  ft,  and  thus  reduce 
the  complexity  of  computational  problems. 
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